DAG Summary - Experimental Causal Structure

Author

Dan Swart


DAG RENDERING USING DiagrammeR.

There is no analysis with DiagrammeR, but analysis follows below.

Show the code
library(DiagrammeR)

grViz("
  digraph DAG {
    # Graph settings
    graph [layout=neato, margin=\"0.0, 0.0, 0.0, 0.0\"]  # Increase margins (format:   \"top,right,bottom,left\")
    
    # Add a title using a simple label approach
    labelloc=\"t\"
    label=\"Experimental Causal Pathways\\nExamining direct relationships in an experimental structure\\n   \\n\"      fontname=\"Cabin\"
    fontsize=16
    
    # Node settings
    node [shape=plaintext, fontsize=16, fontname=\"Cabin\"]
    
    # Edge settings
    edge [penwidth=1.50, color=\"darkblue\", arrowsize=1.00]
    
    # Nodes with exact coordinates
    X [label=\"X (treatment)\", pos=\"1.0, 2.0!\", fontcolor=\"royalblue\"]
    Y [label=\"Y (outcome)\", pos=\"3.0, 2.0!\", fontcolor=\"dodgerblue\"]
    Z [label=\"Z\", pos=\"2.0, 3.0!\", fontcolor=\"red\"]
    C [label=\"C\", pos=\"2.0, 1.0!\", fontcolor=\"purple\"]
    A [label=\"A\", pos=\"1.0, 3.0!\", fontcolor=\"purple\"]
    B [label=\"B\", pos=\"3.0, 3.0!\", fontcolor=\"purple\"]
    
    # Edges
    X -> Y
    Z -> Y
    C -> Y
    B -> Y
    Z -> X [style=invis]
    C -> X [style=invis]
    A -> X [style=invis]
    A -> Z
    B -> Z
    
    # Caption as a separate node at the bottom
    Caption [shape=plaintext, label=\"Figure 1: Experimental Causal Structure\", 
             fontsize=10, pos=\"2,0.0!\"]
  }
  ")


DAG Visualization using ggdag and dagitty

Show the code
# Define the DAG
causal_salad_dag4 <- ggdag::dagify(
  Y ~ X, 
  Y ~ C,
  Y ~ Z,
  Y ~ B,
  Z ~ A + B,
  exposure = "X",
  outcome = "Y",
  coords = list(x = c(X = 1, Y = 3, Z = 2, C = 2, A = 1, B = 3),
                y = c(X = 2, Y = 2, Z = 3, C = 1, A = 3, B = 3)
                )
)

# Create a nice visualization of the DAG
ggdag(causal_salad_dag4) + 
  theme_dag() +
  label("DAG: Experimental Causal Structure")

Directed Acyclic Graph with X as exposure, Y as outcome, and experimental causal structure


Executive Summary: Understanding Experimental Causal Structures

An experimental causal structure represents a scenario where the exposure variable (X) is manipulated independently, as in a randomized controlled trial. In this DAG, we have:

  1. Independence of X: The exposure X has no incoming arrows, indicating it is randomly assigned or experimentally manipulated
  2. Direct effect on Y: X directly causes Y, which is the causal effect we want to measure
  3. Other influences on Y: Z, C, and B all directly cause Y, representing other factors that affect the outcome
  4. No confounding of X and Y: There are no backdoor paths between X and Y, as no common causes exist

Why is this Structure Important?

In experimental causal structures:

  1. No adjustment necessary: The causal effect of X on Y can be estimated without adjusting for any variables
  2. Unbiased estimation: The effect estimate is unbiased due to the independence of X from other causal factors
  3. Maximum statistical power: No adjustment variables means greater statistical efficiency

Minimal Sufficient Adjustment Sets

For this DAG, the minimal sufficient adjustment set is empty {}: - No adjustment is necessary because X is independent of all other causes of Y

Real-World Example

Consider a randomized controlled trial testing a new medication (X) on blood pressure (Y): - Patient genetics (Z) affects blood pressure (Y) directly - Diet (C) affects blood pressure (Y) directly - Exercise habits (B) affect both education about health (Z) and blood pressure (Y) directly - Family history (A) affects education about health (Z)

Because the medication is randomly assigned, researchers do not need to control for any variables to estimate its causal effect on blood pressure.

How to Handle Experimental Structures

  1. Verify the true independence of the exposure variable
  2. Estimate the causal effect directly without adjustment
  3. Consider adjusting for prognostic factors of Y to increase precision (not to reduce bias)
  4. Be aware that while no adjustment is necessary for identifying the causal effect, adjustment might still be beneficial for increasing the precision of the estimate

Experimental causal structures represent the gold standard for causal inference because they eliminate confounding by design, allowing for straightforward estimation of causal effects.

Executive Summary: The Randomization Principle

This DAG illustrates the randomization principle, a fundamental concept in causal inference:

  1. X is randomized: The exposure has no parents in the graph, meaning it is assigned independently of any other variables

  2. No backdoor paths exist: Since X has no incoming arrows, there are no backdoor paths between X and Y

  3. Causal effect is directly identifiable: The association between X and Y represents the true causal effect without bias

Why Randomization Solves the Confounding Problem

The randomization principle works because:

  1. Breaks all potential confounding relationships: By assigning X independently, we sever any connections that would create backdoor paths

  2. Creates balance on all variables: Randomization balances both measured and unmeasured variables between treatment groups

  3. Eliminates selection bias: Random assignment prevents systematic differences between groups

Real-World Application

In a clinical trial testing a new drug (X) on patient recovery (Y):

  • Randomization ensures the drug is assigned independently of:
    • Patient characteristics (age, genetics, comorbidities)
    • Disease severity
    • Other treatments
  • Any observed difference in recovery between treatment and control groups can be attributed to the drug itself

Practical Considerations

  1. Perfect randomization: In practice, perfect randomization may be approximated but not achieved

  2. Small sample considerations: In small samples, random imbalances can still occur

  3. Adjustment for precision: While not necessary for unbiased estimation, adjusting for prognostic factors can increase statistical precision

Understanding the randomization principle explains why randomized controlled trials are considered the gold standard for causal inference - they create a DAG structure where the causal effect is directly identifiable without adjustment for confounding.


2. Results

2.1 Table of Key DAG Properties

Show the code
DT::datatable(
  properties_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
Table 1: Key Properties of the DAG


2.2 Table of Conditional Independencies

Show the code
DT::datatable(
  independencies_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)


2.3 Table of Paths Between X and Y

Show the code
DT::datatable(
  paths_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
Table 2: All Paths Between X and Y


2.4 Table of Ancestors and Descendants

Show the code
DT::datatable(
  ancestors_descendants_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
# kable(ancestors_descendants_df) %>%
#   kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 3: Ancestors and Descendants


2.5 Table of D-Separation Results

Show the code
DT::datatable(
  d_sep_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
# kable(d_sep_df) %>%
#   kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 4: D-Separation Test Results


2.6 Table of Impact of Adjustments

Show the code
DT::datatable(
  adjustment_effect_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
# kable(adjustment_effect_df) %>%
#   kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 5: Effect of Different Adjustment Sets


2.7 Table of Unmeasured Confounding Impact

Show the code
DT::datatable(
  unmeasured_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
# kable(unmeasured_df) %>%
#   kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 6: Impact of Treating Variables as Unmeasured


2.8 Table of Instrumental Variables

Show the code
DT::datatable(
  instruments_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
# kable(instruments_df) %>%
#   kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 7: Potential Instrumental Variables


3. Visualizing Status, Adjustment Sets and Paths with ggdag

Show the code
# Create dagitty object with ggdag positioning
experiment_dag_tidy <- tidy_dagitty(experiment_dag)

# Status plot showing exposure/outcome
ggdag_status(experiment_dag_tidy) +
  theme_dag() +
  label("Status Plot: Exposure and Outcome")

# Adjustment set visualization
ggdag_adjustment_set(experiment_dag_tidy) +
  theme_dag() +
  label("Adjustment Sets for X → Y")

# Paths visualization
ggdag_paths(experiment_dag_tidy) +
  theme_dag() +
  label("All Paths between X and Y")

Status Plot: Exposure and Outcome

Adjustment Sets for X → Y

All Paths between X and Y

Different visualizations of the DAG


4. Interpretation and Discussion

4.1 Key Insights about this DAG Structure

This experimental DAG illustrates several important causal principles:

  1. Independence of Exposure Variable X
    • X has no incoming arrows (no parents)
    • This represents random assignment or experimental manipulation
    • X is independent of all other variables in the system
  2. Direct Effect of X on Y
    • There is a direct causal path from X to Y
    • This represents the causal effect we want to estimate
  3. No Backdoor Paths Between X and Y
    • There are no paths starting with an arrow pointing into X
    • No confounding exists between X and Y
    • This is the key feature of experimental designs
  4. Multiple Other Causes of Y
    • Z, C, and B directly affect Y
    • These represent other determinants of the outcome
    • These do not bias the X-Y relationship due to independence of X

4.2 Proper Identification Strategy

To identify the causal effect of X on Y: - No adjustment is necessary due to the experimental design - The minimal sufficient adjustment set is empty { } - The unadjusted association between X and Y is an unbiased estimate of the causal effect - Adjusting for prognostic variables of Y may increase precision without affecting bias


Key DAG Terms and Concepts

DAG (Directed Acyclic Graph): A graphical representation of causal relationships where arrows indicate the direction of causality, and no variable can cause itself through any path (hence “acyclic”).

Exposure: The variable whose causal effect we want to estimate (often called the treatment or independent variable).

Outcome: The variable we are interested in measuring the effect on (often called the dependent variable).

Confounder: A variable that influences both the exposure and the outcome, potentially creating a spurious association between them.

Mediator: A variable that lies on the causal pathway between the exposure and outcome (exposure → mediator → outcome).

Collider: A variable that is influenced by both the exposure and the outcome, or by two variables on a path (e.g., A → C ← B).

Backdoor path: Any non-causal path connecting the exposure to the outcome that creates a spurious association.

Understanding the Analysis Tables

1. Key Properties Table

This table provides a high-level overview of the DAG structure and key causal features:

  • Acyclic DAG: Confirms the graph has no cycles (a prerequisite for valid causal analysis)

  • Causal effect identifiable: Indicates whether the causal effect can be estimated from observational data

  • Number of paths: Total number of paths connecting exposure and outcome

  • Number of backdoor paths: Paths creating potential confounding that need to be blocked

  • Direct effect exists: Whether there is a direct causal link from exposure to outcome

  • Potential mediators: Variables that may mediate the causal effect

  • Number of adjustment sets: How many different sets of variables could be adjusted for

  • Minimal adjustment sets: The smallest sets of variables that block all backdoor paths

2. Conditional Independencies Table

Shows the implied conditional independencies in the DAG - pairs of variables that should be statistically independent when conditioning on specific other variables. These can be used to test the validity of your DAG against observed data.

3. Paths Analysis Table

Enumerates all paths connecting the exposure to the outcome:

  • Path: The specific variables and connections in each path

  • Length: Number of edges in the path

  • IsBackdoor: Whether this is a backdoor path (potential source of confounding)

  • IsDirected: Whether this is a directed path from exposure to outcome

Testing whether these paths are open or closed under different conditioning strategies is crucial for causal inference.

4. Ancestors and Descendants Table

Shows which variables can causally affect (ancestors) or be affected by (descendants) each variable in the DAG:

  • Understanding ancestry relationships helps identify potential confounders

  • Descendants should not be controlled for as this may introduce bias

5. D-Separation Results Table

Shows whether exposure and outcome are conditionally independent (d-separated) when conditioning on different variable sets:

  • Is_D_Separated = Yes: This set of conditioning variables blocks all non-causal paths

  • Is_D_Separated = No: Some non-causal association remains

This helps identify sufficient adjustment sets for estimating causal effects.

6. Impact of Adjustments Table

Shows how different adjustment strategies affect the identification of causal effects:

  • Total_Paths: Total number of paths between exposure and outcome

  • Open_Paths: Number of paths that remain open after adjustment

Ideally, adjusting for the right variables leaves only the causal paths open.

7. Unmeasured Confounding Impact Table

Simulates the effect of being unable to measure certain variables:

  • Original_Adjustment_Sets: Number of valid adjustment sets with all variables measured

  • Adjusted_Sets_When_Unmeasured: Number of valid adjustment sets when this variable is unmeasured

This helps identify which variables are most critical to measure for valid causal inference.

8. Instrumental Variables Table

Lists potential instrumental variables - variables that affect the exposure but have no direct effect on the outcome except through the exposure. These are useful for causal inference when confounding is present, especially in methods like instrumental variable estimation.

How to Use This Analysis for Causal Inference

  1. Identify minimal sufficient adjustment sets: These are the variables you should control for in your analysis to remove confounding.

  2. Avoid conditioning on colliders: This can introduce bias. Check the paths and d-separation results to ensure your adjustment strategy doesn’t open non-causal paths.

  3. Validate your DAG: Use the implied conditional independencies to test your causal assumptions against observed data.

  4. Assess sensitivity to unmeasured confounding: The unmeasured confounding analysis helps understand how robust your conclusions might be.

  5. Consider mediation analysis: If mediators are present, you might want to decompose total effects into direct and indirect components.

  6. Look for instrumental variables: These can help establish causality even in the presence of unmeasured confounding.

Remember that the validity of any causal inference depends on the correctness of your DAG - it represents your causal assumptions about the data-generating process, which should be based on substantive domain knowledge.